Dataset summary: Dengue - grouped by patient

Report generated using dataprep.

Dengue dataset report

Overview

Dataset Statistics

Number of Variables 14
Number of Rows 14484
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 32
Duplicate Rows (%) 0.2%
Total Size in Memory 3.3 MB
Average Row Size in Memory 235.4 B

Variable Types

Categorical 9
Numerical 5

Variables

dsource

categorical

Distinct Count 10
Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 1.7 MB

Length

Mean 3.1826
Standard Deviation 0.9882
Median 4
Minimum 2
Maximum 5

Sample

1st row 01nva
2nd row 01nva
3rd row 01nva
4th row 01nva
5th row 01nva

Letter

Count 28967
Lowercase Letter 28967
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 17130

age

numerical

Distinct Count 27
Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.0 MB
Mean 8.2995
Minimum 0
Maximum 18
Zeros 4
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 0
5-th Percentile 2
Q1 5
Median 8
Q3 11
95-th Percentile 14
Maximum 18
Range 18
IQR 6

Descriptive Statistics

Mean 8.2995
Standard Deviation 3.9859
Variance 15.8876
Sum 120210.5
Skewness -0.02044
Kurtosis -0.8352
Coefficient of Variation 0.4803

gender

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.8737
Standard Deviation 0.992
Median 4
Minimum 4
Maximum 6

Sample

1st row Male
2nd row Female
3rd row Female
4th row Male
5th row Female

Letter

Count 70590
Lowercase Letter 56106
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

weight

numerical

Distinct Count 353
Unique (%) 2.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.0 MB
Mean 28.4854
Minimum 7.2
Maximum 114
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 7.2
5-th Percentile 12
Q1 19
Median 26
Q3 37
95-th Percentile 52
Maximum 114
Range 106.8
IQR 18

Descriptive Statistics

Mean 28.4854
Standard Deviation 12.8
Variance 163.8408
Sum 412582.6
Skewness 0.8739
Kurtosis 0.8789
Coefficient of Variation 0.4494

bleeding

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.7429
Standard Deviation 0.4371
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 68696
Lowercase Letter 54212
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

plt

numerical

Distinct Count 1144
Unique (%) 7.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.0 MB
Mean 167.1835
Minimum 3
Maximum 829
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 3
5-th Percentile 24
Q1 71
Median 169
Q3 243
95-th Percentile 338
Maximum 829
Range 826
IQR 172

Descriptive Statistics

Mean 167.1835
Standard Deviation 104.5549
Variance 10931.7272
Sum 2.4215e+06
Skewness 0.4243
Kurtosis -0.1921
Coefficient of Variation 0.6254

shock

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.9516
Standard Deviation 0.2146
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row True
3rd row True
4th row True
5th row True

Letter

Count 71719
Lowercase Letter 57235
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

haematocrit_percent

numerical

Distinct Count 560
Unique (%) 3.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.0 MB
Mean 41.3015
Minimum 21
Maximum 67.05
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 21
5-th Percentile 33.5
Q1 37.2
Median 40.3
Q3 45
95-th Percentile 52
Maximum 67.05
Range 46.05
IQR 7.8

Descriptive Statistics

Mean 41.3015
Standard Deviation 5.6372
Variance 31.7783
Sum 598210.2593
Skewness 0.6312
Kurtosis 0.1445
Coefficient of Variation 0.1365

bleeding_gum

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.8903
Standard Deviation 0.3125
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 70831
Lowercase Letter 56347
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

abdominal_pain

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.682
Standard Deviation 0.4657
Median 5
Minimum 4
Maximum 5

Sample

1st row True
2nd row True
3rd row True
4th row True
5th row True

Letter

Count 67814
Lowercase Letter 53330
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

ascites

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.8391
Standard Deviation 0.3675
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row False
4th row False
5th row False

Letter

Count 70089
Lowercase Letter 55605
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

bleeding_mucosal

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.8159
Standard Deviation 0.3876
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 69754
Lowercase Letter 55270
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

bleeding_skin

categorical

Distinct Count 2
Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 4.5429
Standard Deviation 0.4982
Median 5
Minimum 4
Maximum 5

Sample

1st row False
2nd row False
3rd row True
4th row False
5th row False

Letter

Count 65800
Lowercase Letter 51316
Space Separator 0
Uppercase Letter 14484
Dash Punctuation 0
Decimal Number 0

body_temperature

numerical

Distinct Count 1218
Unique (%) 8.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.0 MB
Mean 37.8323
Minimum 35
Maximum 41.5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%

Quantile Statistics

Minimum 35
5-th Percentile 37
Q1 37.2
Median 37.58
Q3 38.3333
95-th Percentile 39.5
Maximum 41.5
Range 6.5
IQR 1.1333

Descriptive Statistics

Mean 37.8323
Standard Deviation 0.8257
Variance 0.6818
Sum 547963.5822
Skewness 0.9208
Kurtosis 0.2582
Coefficient of Variation 0.02183

Interactions

Correlations

Missing Values



 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
 import pandas as pd
 import numpy as np
 from dataprep.eda import create_report
 from pkgname.utils.data_loader import load_dengue, IQR_rule
 from pkgname.utils.print_utils import suppress_stdout, suppress_stderr

 features = ["dsource", "age", "gender", "weight", "bleeding", "plt",
             "shock", "haematocrit_percent", "bleeding_gum", "abdominal_pain",
             "ascites", "bleeding_mucosal", "bleeding_skin", "body_temperature"]

 with suppress_stdout() and suppress_stderr():

     df = load_dengue(usecols=['study_no']+features)

     for feat in features:
         df[feat] = df.groupby('study_no')[feat].ffill().bfill()

     df = df.loc[df['age'] <= 18]
     df = df.dropna()

     df = df.groupby(by="study_no", dropna=False).agg(
         dsource=pd.NamedAgg(column="dsource", aggfunc="last"),
         age=pd.NamedAgg(column="age", aggfunc="max"),
         gender=pd.NamedAgg(column="gender", aggfunc="first"),
         weight=pd.NamedAgg(column="weight", aggfunc=np.mean),
         bleeding=pd.NamedAgg(column="bleeding", aggfunc="max"),
         plt=pd.NamedAgg(column="plt", aggfunc="min"),
         shock=pd.NamedAgg(column="shock", aggfunc="max"),
         haematocrit_percent=pd.NamedAgg(column="haematocrit_percent", aggfunc="max"),
         bleeding_gum=pd.NamedAgg(column="bleeding_gum", aggfunc="max"),
         abdominal_pain=pd.NamedAgg(column="abdominal_pain", aggfunc="max"),
         ascites=pd.NamedAgg(column="ascites", aggfunc="max"),
         bleeding_mucosal=pd.NamedAgg(column="bleeding_mucosal", aggfunc="max"),
         bleeding_skin=pd.NamedAgg(column="bleeding_skin", aggfunc="max"),
         body_temperature=pd.NamedAgg(column="body_temperature", aggfunc=np.mean),
     ).dropna()

     df = IQR_rule(df, ['plt'])

     report = create_report(df, title="Dengue dataset report")

 report

Total running time of the script: ( 0 minutes 5.414 seconds)

Gallery generated by Sphinx-Gallery